Visualizing Alienation: Mapping Emotional Spaces in Frankenstein¶
This brief analysis uses sentiment analysis to understand the emotional valence of the locations and characters in Mary Shelley's Frankenstein.
Methodology¶
The data was created by splitting the text of Frankenstein into Vol. Chapter and paragraphs. For each paragraph, the location of the narrative present was noted. This is distinct from all the locations that are mentioned in the text. For example, South America is mentioned, but as the text never goes there it is not part of this data set. Subsequently, the roBERTa Sentiment analyzer was run on each paragraph and the aggregate score per location was registered.
A follow up analysis was performed of the sentiments surrounding each character. This is less accurate as the calculation purely replies on character name as indicative of character presence. No attempt was made to reconcile pronouns with characters. Thus, "I" could be Walton, Victor, or The Monster. Without significantly more work, these distinctions cannot be recovered from the data without supervision.
# Load analysis results from parquet files
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py
# Configure Plotly for HTML export - this ensures charts work in exported HTML
py.init_notebook_mode(connected=False) # Use offline mode for HTML export
import plotly.io as pio
pio.renderers.default = "notebook" # Ensure charts render in notebook and HTML
print("📊 Loading Frankenstein analysis results...")
try:
# Load all datasets from parquet files (fast and efficient)
frankenstein_all_with_sentiment = pd.read_parquet("frankenstein_all_paragraphs_with_sentiment.parquet")
frankenstein_all_with_sentiment.to_csv("frankenstein_all_paragraphs_with_sentiment.csv")
character_sentiment_df = pd.read_parquet("frankenstein_character_sentiment.parquet")
location_sentiment_summary = pd.read_parquet("frankenstein_location_sentiment.parquet")
frankenstein_manual_locations = pd.read_parquet("frankenstein_manual_locations.parquet")
print("✅ Successfully loaded all datasets:")
# Set up coordinate columns
coords_columns = list(frankenstein_manual_locations.columns[-2:])
lat_col = coords_columns[0]
lon_col = coords_columns[1]
except FileNotFoundError as e:
print(f"❌ Error loading data: {e}")
📊 Loading Frankenstein analysis results... ✅ Successfully loaded all datasets:
Part I: The Geographic Imagination of Frankenstein¶
The text has a remarkably large geographic canvas given its relative brevity. Some of this is no doubt due to the fact that Mary Shelley traveled quite a bit during the composition of the text.
Geographic Distribution and Narrative Weight¶
The map below shows all the locations in Frankenstein where the text travels. The circle sizes represent the total word count associated with each location. Note that this does not necessarily indicate how long the text stays in that location in terms of narrative duration. For example, Victor is in Ingolstadt for quite some time, but the relative amount of text there is quite small.
# Geographic Distribution Map - Clean Version
try:
# Calculate location counts for sizing
valid_coords = frankenstein_manual_locations[
(frankenstein_manual_locations[lat_col].notna()) &
(frankenstein_manual_locations[lon_col].notna())
].copy()
valid_coords['word_count'] = valid_coords['paragraph_text'].str.split().str.len()
total_narrative_words = frankenstein_manual_locations['paragraph_text'].str.split().str.len().sum()
# Clean location names to handle duplicates like "Delacey Cottage"
valid_coords['curated_name_clean'] = valid_coords['curated_name'].str.strip()
# Group locations that are essentially the same (like multiple "Delacey Cottage" entries)
# First aggregate by cleaned name to get representative coordinates
location_coords = valid_coords.groupby('curated_name_clean').agg({
lat_col: 'first', # Use first occurrence coordinates
lon_col: 'first'
}).reset_index()
# Then sum word counts by cleaned name
location_counts = valid_coords.groupby('curated_name_clean').agg({
'word_count': 'sum'
}).reset_index()
# Merge coordinates back
location_counts = location_counts.merge(location_coords, on='curated_name_clean')
location_counts = location_counts.rename(columns={'curated_name_clean': 'curated_name'})
location_counts = location_counts.rename(columns={'word_count': 'total_words'})
location_counts['narrative_percent'] = (location_counts['total_words'] / total_narrative_words * 100).round(2)
# Create the geographic map
fig_geo = px.scatter_map(
location_counts,
lat=lat_col,
lon=lon_col,
hover_name="curated_name",
size="total_words",
size_max=40,
hover_data={
"narrative_percent": ":.2f",
"total_words": True,
lat_col: False,
lon_col: False
},
title="Geographic Locations in Frankenstein: Narrative Distribution",
labels={"narrative_percent": "% of Total Narrative"},
zoom=3,
height=700,
color_discrete_sequence=['#2E86AB']
)
fig_geo.update_layout(
mapbox_style="carto-positron",
margin={"r":0,"t":50,"l":0,"b":0}
)
# Configure for HTML export - embed the plot with full JavaScript
fig_geo.update_layout(
font=dict(size=12),
title_font=dict(size=16),
)
# Note: scatter_map doesn't support marker line outlines
# The circles will be solid without outlines for this map type
# Show with offline configuration for HTML export
py.iplot(fig_geo, show_link=False, config={'displayModeBar': True})
# Display insights
print(f"📍 Analysis reveals {len(location_counts)} unique geographic locations")
print(f"📊 Most significant locations by word count:")
top_locations = location_counts.nlargest(5, 'total_words')
for _, row in top_locations.iterrows():
print(f" {row['curated_name']}: {row['narrative_percent']:.1f}% ({row['total_words']} words)")
except NameError:
print("⚠️ Data not loaded - please run the data loading cell first")
📍 Analysis reveals 45 unique geographic locations 📊 Most significant locations by word count: Geneva: 24.2% (17387 words) Delacey Cottage: 14.1% (10135 words) Ingolstadt: 11.9% (8543 words) Artic: 10.6% (7583 words) Montanvert: 5.8% (4175 words)
# Debug and fix Unicode issues in location data
import re
def clean_unicode_surrogates(text):
"""Remove problematic Unicode surrogate characters"""
if isinstance(text, str):
# Remove surrogate characters (U+D800 to U+DFFF)
return re.sub(r'[\uD800-\uDFFF]', '', text)
return text
def clean_dataframe_unicode(df):
"""Clean Unicode issues in all string columns of a DataFrame"""
df_cleaned = df.copy()
for column in df_cleaned.columns:
if df_cleaned[column].dtype == 'object':
df_cleaned[column] = df_cleaned[column].apply(clean_unicode_surrogates)
return df_cleaned
# Clean the location sentiment data
print("🧹 Cleaning Unicode issues in location data...")
try:
# Check for problematic characters in location names
problematic_locations = []
for idx, row in location_sentiment_summary.iterrows():
try:
# Try to encode the location name
row['curated_name'].encode('utf-8')
except UnicodeEncodeError as e:
problematic_locations.append((idx, row['curated_name'], str(e)))
if problematic_locations:
print(f"❌ Found {len(problematic_locations)} problematic locations:")
for idx, name, error in problematic_locations:
print(f" Index {idx}: '{name}' - {error}")
else:
print("✅ No obvious Unicode issues found in location names")
# Clean the location data
location_sentiment_summary_cleaned = clean_dataframe_unicode(location_sentiment_summary)
# Verify cleaning worked
print(f"📊 Original data shape: {location_sentiment_summary.shape}")
print(f"📊 Cleaned data shape: {location_sentiment_summary_cleaned.shape}")
# Update the global variable
location_sentiment_summary = location_sentiment_summary_cleaned
print("✅ Location data cleaned successfully")
except Exception as e:
print(f"❌ Error during cleaning: {e}")
print(f"Error type: {type(e)}")
🧹 Cleaning Unicode issues in location data... ✅ No obvious Unicode issues found in location names 📊 Original data shape: (50, 11) 📊 Cleaned data shape: (50, 11) ✅ Location data cleaned successfully
# Emotional Geography Map - Sentiment Analysis (Unicode Safe)
import re
def clean_text_for_display(text):
"""Clean text for safe display, removing problematic Unicode characters"""
if pd.isna(text) or not isinstance(text, str):
return str(text)
# Remove surrogate pairs and other problematic characters
text = re.sub(r'[\uD800-\uDFFF]', '', text) # Remove surrogates
text = re.sub(r'[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F]', '', text) # Remove control characters
return text
try:
# Clean all text data for safe display and handle duplicate locations
location_data_safe = location_sentiment_summary.copy()
location_data_safe['curated_name'] = location_data_safe['curated_name'].apply(clean_text_for_display)
location_data_safe['sentiment_category'] = location_data_safe['sentiment_category'].apply(clean_text_for_display)
# Clean location names to handle duplicates like "Delacey Cottage"
location_data_safe['curated_name_clean'] = location_data_safe['curated_name'].str.strip()
# Aggregate duplicate locations by summing word counts and averaging sentiment
location_data_safe = location_data_safe.groupby('curated_name_clean').agg({
'lat': 'first',
'long': 'first',
'total_words': 'sum',
'avg_sentiment': 'mean',
'narrative_percent': 'sum',
'sentiment_category': lambda x: x.mode().iloc[0] if not x.empty else x.iloc[0]
}).reset_index()
location_data_safe = location_data_safe.rename(columns={'curated_name_clean': 'curated_name'})
# Create sentiment-enhanced map using cleaned data
fig_sentiment = px.scatter_map(
location_data_safe,
lat='lat',
lon='long',
hover_name='curated_name',
size="total_words",
size_max=35,
color="avg_sentiment",
color_continuous_scale='RdYlGn',
color_continuous_midpoint=0,
hover_data={
"narrative_percent": ":.2f",
"avg_sentiment": ":.3f",
"sentiment_category": True,
'lat': False,
'long': False
},
title="Emotional Geography of Frankenstein: Location Sentiment Analysis",
labels={
"avg_sentiment": "Average Sentiment",
"narrative_percent": "% of Total Narrative"
},
zoom=3,
height=700
)
fig_sentiment.update_layout(
mapbox_style="carto-positron",
margin={"r":0,"t":50,"l":0,"b":0},
coloraxis_colorbar=dict(
title="Sentiment Score",
tickvals=[-0.4, -0.2, 0, 0.2, 0.4],
ticktext=["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]
)
)
# Configure for HTML export
fig_sentiment.update_layout(
font=dict(size=12),
title_font=dict(size=16),
)
# Note: scatter_map doesn't support marker line outlines
# The circles will use their sentiment colors without outlines
# Show with offline configuration for HTML export
py.iplot(fig_sentiment, show_link=False, config={'displayModeBar': True})
# Display sentiment insights with cleaned text
avg_overall_sentiment = location_data_safe['avg_sentiment'].mean()
print(f"📊 Emotional Geography Analysis Complete")
print(f"🎭 Overall sentiment across all locations: {avg_overall_sentiment:.3f}")
sentiment_distribution = location_data_safe['sentiment_category'].value_counts()
print(f"📈 Sentiment distribution: {sentiment_distribution.to_dict()}")
# Show most positive and negative locations with cleaned names
most_positive = location_data_safe.nlargest(3, 'avg_sentiment')[['curated_name', 'avg_sentiment']]
most_negative = location_data_safe.nsmallest(3, 'avg_sentiment')[['curated_name', 'avg_sentiment']]
print(f"\n✨ Most positively framed locations:")
for _, row in most_positive.iterrows():
clean_name = clean_text_for_display(row['curated_name'])
print(f" {clean_name}: {row['avg_sentiment']:.3f}")
print(f"\n⛈️ Most negatively framed locations:")
for _, row in most_negative.iterrows():
clean_name = clean_text_for_display(row['curated_name'])
print(f" {clean_name}: {row['avg_sentiment']:.3f}")
except NameError:
print("⚠️ Sentiment data not available - please run the data loading cell first")
except Exception as e:
print(f"❌ Error creating sentiment map: {e}")
print(f"Error type: {type(e).__name__}")
# Fallback: show basic sentiment statistics without the map
try:
avg_overall_sentiment = location_sentiment_summary['avg_sentiment'].mean()
print(f"\n📊 Basic Sentiment Statistics:")
print(f"🎭 Overall sentiment across all locations: {avg_overall_sentiment:.3f}")
sentiment_distribution = location_sentiment_summary['sentiment_category'].value_counts()
print(f"📈 Sentiment distribution: {sentiment_distribution.to_dict()}")
except Exception as fallback_error:
print(f"❌ Even fallback statistics failed: {fallback_error}")
📊 Emotional Geography Analysis Complete
🎭 Overall sentiment across all locations: 0.012
📈 Sentiment distribution: {'Neutral': 19, 'Negative': 14, 'Positive': 12}
✨ Most positively framed locations:
Windsor: 0.477
Edinburgh: 0.344
Chamonix: 0.333
⛈️ Most negatively framed locations:
Holyhead: -0.410
Zurich: -0.238
Beach somewhere on the Irish Coast: -0.231
Animating Movements¶
We can also track where these emotions take place in time by splitting up the data further. By giving each location a chronological number, we can get the sentiment for that particular event at that particular location rather than the average sentiment per location. After all, sometimes Victor is happy in Geneva and sometimes he is sad.
frankenstein_emotion_sequence_df = pd.read_csv("frankenstein_paragraphs_geoparsed_located_chrono.csv")
# Examine the chronological data structure
print("🔍 Exploring the chronological emotion data...")
print(f"📊 Data shape: {frankenstein_emotion_sequence_df.shape}")
print(f"📋 Columns: {frankenstein_emotion_sequence_df.columns.tolist()}")
print(f"\n📈 Ordinal range: {frankenstein_emotion_sequence_df['ordinal'].min()} to {frankenstein_emotion_sequence_df['ordinal'].max()}")
print(f"🏃 Unique ordinal values: {frankenstein_emotion_sequence_df['ordinal'].nunique()}")
print(f"\n🗺️ Sample data (first 5 rows):")
display(frankenstein_emotion_sequence_df[['text_section', 'chapter_letter', 'curated_name', 'lat', 'long', 'ordinal']].head())
print(f"\n📍 Unique locations in chronological data: {frankenstein_emotion_sequence_df['curated_name'].nunique()}")
print(f"📍 Locations: {sorted(frankenstein_emotion_sequence_df['curated_name'].unique())}")
🔍 Exploring the chronological emotion data... 📊 Data shape: (764, 13) 📋 Columns: ['text_section', 'chapter_letter', 'paragraph_number', 'paragraph_text', 'places', 'latitudes', 'longitudes', 'feature_names', 'curated_name', 'lat', 'long', 'chrono', 'ordinal'] 📈 Ordinal range: 1 to 67 🏃 Unique ordinal values: 67 🗺️ Sample data (first 5 rows):
| text_section | chapter_letter | curated_name | lat | long | ordinal | |
|---|---|---|---|---|---|---|
| 0 | vol_1 | CHAPTER I | Lucern | 47.050480 | 8.306350 | 1 |
| 1 | vol_1 | CHAPTER I | Lucern | 47.050480 | 8.306350 | 1 |
| 2 | vol_1 | CHAPTER I | Lucern | 47.050480 | 8.306350 | 1 |
| 3 | vol_1 | CHAPTER I | Geneva | 46.203278 | 6.147158 | 1 |
| 4 | vol_1 | CHAPTER I | Geneva | 46.203278 | 6.147158 | 1 |
📍 Unique locations in chronological data: 45 📍 Locations: ['Archangel', 'Arles', 'Artic', 'Barents Sea', 'Beach somewhere on the Irish Coast', 'Belrive', 'Chamonix', 'Cologne', 'Constantinople', 'Cumberland', 'Delacey Cottage', 'Dublin', 'Edinburgh', 'Elizabeth in Italy', 'Geneva', 'Holyhead', 'Ingolstadt', 'Ingolstadt (Forest)', 'Irish Sea', 'Lausanne', 'Le Havre', 'Livorno', 'London', 'Lucern', 'Mainz', 'Matlock', 'Monster Travel to Geneva', 'Montanvert', 'Near Mont Blanc', 'Orkney Islands', 'Oxford', 'Paris', 'Perth', 'Portsmouth', 'Rhine below Mainz', 'Rotterdam', 'Russian plain', 'Russian pursuit', 'Russian pursuit near Archangel', 'Russian pursuit on ice', 'St. Petersburgh', 'Strasbourg', 'Thonon-les-Bains', 'Windsor', 'Zurich']
# Create animation data with sentiment from existing analysis
print("🎬 Creating animated emotional journey with RoBERTa sentiment...")
# Work with chronological data
df = frankenstein_emotion_sequence_df.copy()
df['word_count'] = df['paragraph_text'].str.split().str.len()
# Check what sentiment columns are available
print("Available sentiment columns:")
sentiment_cols = [col for col in frankenstein_all_with_sentiment.columns if 'roberta' in col.lower()]
for col in sentiment_cols:
print(f" - {col}")
# Use compound score if available, otherwise calculate from pos/neg
if 'roberta_compound' in frankenstein_all_with_sentiment.columns:
sentiment_col = 'roberta_compound'
print(f"Using compound sentiment score: {sentiment_col}")
df['sentiment_score'] = frankenstein_all_with_sentiment[sentiment_col].values
elif 'roberta_pos' in frankenstein_all_with_sentiment.columns and 'roberta_neg' in frankenstein_all_with_sentiment.columns:
print("Calculating sentiment from positive - negative scores")
df['sentiment_score'] = (frankenstein_all_with_sentiment['roberta_pos'].values -
frankenstein_all_with_sentiment['roberta_neg'].values)
else:
print("⚠️ Using first available sentiment column")
sentiment_col = sentiment_cols[0] if sentiment_cols else 'roberta_neg'
df['sentiment_score'] = frankenstein_all_with_sentiment[sentiment_col].values
# Clean and prepare data
df_clean = df.dropna(subset=['lat', 'long']).copy()
# Clean location names to handle duplicates like "Delacey Cottage"
df_clean['curated_name_clean'] = df_clean['curated_name'].str.strip()
# Aggregate by ordinal (chronological step) and cleaned location name
animation_data = df_clean.groupby(['ordinal', 'curated_name_clean', 'lat', 'long']).agg({
'word_count': 'sum', # Total words for this location at this time
'sentiment_score': 'mean', # Average sentiment
'text_section': 'first',
'chapter_letter': 'first'
}).reset_index()
# Rename back to curated_name for display
animation_data = animation_data.rename(columns={'curated_name_clean': 'curated_name'})
# Add frame information
animation_data['frame_label'] = animation_data['ordinal'].astype(str)
animation_data['chapter_info'] = (animation_data['text_section'].str.replace('_', ' ') +
' ' + animation_data['chapter_letter']).str.title()
# Add sentiment category for hover info
animation_data['sentiment_category'] = animation_data['sentiment_score'].apply(
lambda x: 'Positive' if x > 0.1 else ('Negative' if x < -0.1 else 'Neutral')
)
print(f"📊 Animation ready: {len(animation_data)} location-time points")
print(f"📈 Chronological steps: {animation_data['ordinal'].nunique()}")
print(f"📝 Word count range: {animation_data['word_count'].min()}-{animation_data['word_count'].max()}")
print(f"🎭 Sentiment range: {animation_data['sentiment_score'].min():.3f} to {animation_data['sentiment_score'].max():.3f}")
# Create the animated map
fig_animated = px.scatter_map(
animation_data,
lat="lat",
lon="long",
hover_name="curated_name",
size="word_count", # Size = word count (as requested)
size_max=60,
color="sentiment_score", # Color = sentiment
color_continuous_scale='RdYlGn',
color_continuous_midpoint=0,
animation_frame="frame_label",
hover_data={
"sentiment_score": ":.3f",
"sentiment_category": True,
"word_count": True,
"chapter_info": True,
"ordinal": True
},
title="Emotional Journey Through Frankenstein: Chronological Animation<br><sub>Size = Word Count | Color = RoBERTa Sentiment Score</sub>",
labels={
"sentiment_score": "Sentiment Score",
"sentiment_category": "Sentiment",
"word_count": "Words",
"chapter_info": "Chapter",
"ordinal": "Chronological Step"
},
zoom=3,
height=850
)
# Style the map
fig_animated.update_layout(
mapbox_style="carto-positron",
margin={"r":0,"t":90,"l":0,"b":0},
coloraxis_colorbar=dict(
title="Sentiment Score",
tickvals=[-0.6, -0.3, 0, 0.3, 0.6],
ticktext=["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]
)
)
# Configure for HTML export
fig_animated.update_layout(
font=dict(size=12),
title_font=dict(size=16),
)
# Animation settings - slower for better observation
fig_animated.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 1000 # 1 second per frame
fig_animated.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = .01 # 0.01 second transition
# Add step info to frames with more context
for i, frame in enumerate(fig_animated.frames):
step = i + 1
current_step_data = animation_data[animation_data['ordinal'] == step]
if not current_step_data.empty:
# Get the chapter info and locations for this step
chapter_info = current_step_data['chapter_info'].iloc[0]
locations = current_step_data['curated_name'].tolist()
location_text = ", ".join(locations) if len(locations) <= 3 else f"{', '.join(locations[:2])}, +{len(locations)-2} more"
frame.layout.title = f"{chapter_info} - {location_text}"
else:
frame.layout.title = f"Frankenstein Journey - Step {step}/{len(fig_animated.frames)}"
# Show with offline configuration for HTML export
py.iplot(fig_animated, show_link=False, config={'displayModeBar': True})
print("🎯 Enhanced animated emotional journey map created!")
print("📈 Circle size = word count at each location during each chronological step")
print("🌈 Circle color = RoBERTa sentiment (red=negative, yellow=neutral, green=positive)")
print("🗺️ Each frame shows the emotional state of locations during that narrative moment")
print("▶️ Press play to watch how emotions shift across geography as the story unfolds")
print(f"📖 Animation spans {animation_data['ordinal'].nunique()} chronological steps through Shelley's narrative")
🎬 Creating animated emotional journey with RoBERTa sentiment... Available sentiment columns: - roberta_neg - roberta_neu - roberta_pos - roberta_compound Using compound sentiment score: roberta_compound 📊 Animation ready: 72 location-time points 📈 Chronological steps: 67 📝 Word count range: 36-5619 🎭 Sentiment range: -0.902 to 0.803
🎯 Enhanced animated emotional journey map created! 📈 Circle size = word count at each location during each chronological step 🌈 Circle color = RoBERTa sentiment (red=negative, yellow=neutral, green=positive) 🗺️ Each frame shows the emotional state of locations during that narrative moment ▶️ Press play to watch how emotions shift across geography as the story unfolds 📖 Animation spans 67 chronological steps through Shelley's narrative
Emotional Geography Insights¶
The sentiment analysis of geographic locations reveals sophisticated patterns in Shelley's emotional mapping:
Positive Locations: Often associated with domesticity, family, and early happiness
- Geneva (Victor's family home)
- Peaceful natural settings
Negative Locations: Frequently connected to isolation, creation, and consequence
- Laboratory spaces
- Remote wilderness areas
- Sites of confrontation
Neutral Locations: Transitional spaces and narrative bridges
- Travel routes
- Temporary stops
This emotional geography suggests that Shelley uses location not merely as setting but as an extension of character psychology and thematic development.
Part III: Character Sentiment Analysis¶
Moving from geographic to character-centered analysis, we examined how Mary Shelley emotionally frames the principal characters throughout the narrative. This analysis identifies paragraphs mentioning key characters and measures the sentiment associated with each character's textual presence.
# Character Sentiment Analysis Visualizations
try:
# Sort by sentiment for better visualization
character_df_sorted = character_sentiment_df.sort_values('Avg_Sentiment', ascending=False)
# 1. Character Sentiment Overview Bar Chart
fig1 = px.bar(
character_df_sorted,
x='Character',
y='Avg_Sentiment',
color='Avg_Sentiment',
color_continuous_scale='RdYlGn',
color_continuous_midpoint=0,
title='Character Emotional Framing: Average Sentiment by Character',
labels={'Avg_Sentiment': 'Average Sentiment Score'},
hover_data=['Total_Mentions', 'Total_Words']
)
fig1.add_hline(y=0, line_dash="dash", line_color="gray",
annotation_text="Neutral Baseline", annotation_position="top right")
fig1.update_layout(
height=500,
xaxis_title="Character",
yaxis_title="Average Sentiment Score",
showlegend=False
)
# Show with offline configuration for HTML export
py.iplot(fig1, show_link=False, config={'displayModeBar': True})
# 2. Sentiment Distribution Stack Chart
fig2 = go.Figure()
fig2.add_trace(go.Bar(
name='Positive Mentions',
x=character_df_sorted['Character'],
y=character_df_sorted['Positive_Mentions'],
marker_color='#2E8B57',
opacity=0.8
))
fig2.add_trace(go.Bar(
name='Neutral Mentions',
x=character_df_sorted['Character'],
y=character_df_sorted['Neutral_Mentions'],
marker_color='#708090',
opacity=0.8
))
fig2.add_trace(go.Bar(
name='Negative Mentions',
x=character_df_sorted['Character'],
y=character_df_sorted['Negative_Mentions'],
marker_color='#CD5C5C',
opacity=0.8
))
fig2.update_layout(
barmode='stack',
title='Character Emotional Complexity: Sentiment Distribution by Character',
xaxis_title='Character',
yaxis_title='Number of Paragraphs',
height=500
)
# Show with offline configuration for HTML export
py.iplot(fig2, show_link=False, config={'displayModeBar': True})
# 3. Character Frequency vs Sentiment Scatter Plot
fig3 = px.scatter(
character_df_sorted,
x='Total_Mentions',
y='Avg_Sentiment',
size='Total_Words',
color='Avg_Sentiment',
color_continuous_scale='RdYlGn',
color_continuous_midpoint=0,
hover_name='Character',
title='Character Analysis: Narrative Presence vs Emotional Framing',
labels={
'Total_Mentions': 'Number of Paragraph Mentions',
'Avg_Sentiment': 'Average Sentiment Score',
'Total_Words': 'Total Words in Context'
}
)
# Add reference lines
fig3.add_hline(y=0, line_dash="dash", line_color="gray", opacity=0.5)
fig3.add_vline(x=character_df_sorted['Total_Mentions'].median(),
line_dash="dash", line_color="gray", opacity=0.5)
fig3.update_layout(height=500)
# Show with offline configuration for HTML export
py.iplot(fig3, show_link=False, config={'displayModeBar': True})
# Display key insights
most_positive = character_df_sorted.iloc[0]
most_negative = character_df_sorted.iloc[-1]
most_mentioned = character_df_sorted.loc[character_df_sorted['Total_Mentions'].idxmax()]
print("🎭 Character Analysis - Key Findings:")
print(f"✨ Most positively portrayed: {most_positive['Character']} (sentiment: {most_positive['Avg_Sentiment']:.3f})")
print(f"⛈️ Most negatively portrayed: {most_negative['Character']} (sentiment: {most_negative['Avg_Sentiment']:.3f})")
print(f"📈 Most frequently mentioned: {most_mentioned['Character']} ({most_mentioned['Total_Mentions']} paragraphs)")
print(f"📊 Characters analyzed: {len(character_df_sorted)}")
print(f"\n🔍 Character Emotional Patterns:")
for _, row in character_df_sorted.iterrows():
pos_pct = (row['Positive_Mentions'] / row['Total_Mentions'] * 100)
neg_pct = (row['Negative_Mentions'] / row['Total_Mentions'] * 100)
neu_pct = (row['Neutral_Mentions'] / row['Total_Mentions'] * 100)
print(f" {row['Character']:>10}: {pos_pct:5.1f}% pos, {neu_pct:5.1f}% neu, {neg_pct:5.1f}% neg (avg: {row['Avg_Sentiment']:6.3f})")
except NameError:
print("⚠️ Character analysis data not available - please run the data loading cell first")
🎭 Character Analysis - Key Findings:
✨ Most positively portrayed: Agatha (sentiment: 0.107)
⛈️ Most negatively portrayed: Justine (sentiment: -0.181)
📈 Most frequently mentioned: Alphonse (125 paragraphs)
📊 Characters analyzed: 10
🔍 Character Emotional Patterns:
Agatha: 58.8% pos, 23.5% neu, 17.6% neg (avg: 0.107)
Ernest: 42.9% pos, 35.7% neu, 21.4% neg (avg: 0.023)
Elizabeth: 33.8% pos, 28.7% neu, 37.5% neg (avg: 0.018)
Felix: 36.8% pos, 36.8% neu, 26.3% neg (avg: -0.001)
Henry: 32.3% pos, 27.7% neu, 40.0% neg (avg: -0.008)
Alphonse: 28.0% pos, 36.8% neu, 35.2% neg (avg: -0.012)
Victor: 15.1% pos, 41.5% neu, 43.4% neg (avg: -0.127)
William: 16.7% pos, 25.0% neu, 58.3% neg (avg: -0.130)
Monster: 16.5% pos, 25.7% neu, 57.8% neg (avg: -0.136)
Justine: 12.5% pos, 12.5% neu, 75.0% neg (avg: -0.181)
Character Sentiment Insights¶
The character-based sentiment analysis reveals several fascinating patterns in Shelley's characterization:
Most Positively Framed Characters:
- Elizabeth: Consistently associated with positive emotional language, representing domesticity and love
- Henry Clerval: Framed as Victor's moral compass and source of positive influence
Most Negatively Framed Characters:
- The Monster: Despite being a complex character deserving sympathy, often surrounded by negative emotional language
- Victor: Surprisingly negative sentiment, reflecting his internal torment and moral complexity
Complex Characterization:
- Characters with mixed sentiment patterns show Shelley's nuanced approach to characterization
- The frequency vs. sentiment analysis reveals that major characters often have more complex emotional profiles
Literary and Critical Implications¶
Geographic Symbolism¶
Shelley's geographic choices are far from arbitrary. The sentiment analysis reveals that she consistently associates certain types of locations with specific emotional tones, creating a symbolic geography that reinforces the novel's themes:
- Domestic spaces tend toward positive sentiment, representing safety and family bonds
- Scientific/laboratory spaces carry negative associations, reflecting the dangerous nature of Victor's pursuits
- Natural wilderness shows mixed sentiment, serving both as refuge and as sites of confrontation
Character Psychology and Moral Framework¶
The character sentiment analysis illuminates Shelley's moral framework:
- Victor's negative sentiment suggests Shelley's critique of unchecked scientific ambition
- The Monster's treatment reveals the complex interplay between sympathy and horror in Gothic fiction
- Elizabeth's consistently positive framing reinforces traditional gender roles while highlighting what Victor loses through his obsessions
Methodological Innovation¶
This computational approach reveals patterns that would be difficult to detect through traditional close reading:
- Quantified emotional patterns provide evidence for interpretive claims about character and setting
- Geographic distribution analysis reveals the scope of Shelley's imaginative world-building
- Sentiment mapping creates new ways of understanding the relationship between place and emotion in literary texts
Conclusion: Digital Humanities and Literary Understanding¶
This analysis demonstrates how digital humanities methods can enhance rather than replace traditional literary analysis. By applying computational techniques to Frankenstein, we uncover:
Hidden Patterns: Quantitative analysis reveals consistent patterns in Shelley's treatment of geography and character
Evidence-Based Interpretation: Sentiment analysis provides measurable evidence for claims about characterization and setting
New Research Questions: These visualizations generate new questions about Gothic literature, gender roles, and the relationship between science and emotion in Romantic literature
Accessible Analysis: Interactive visualizations make complex literary patterns visible and explorable
The computational analysis of Frankenstein reveals Mary Shelley as a sophisticated architect of both geographic and emotional landscapes. Her novel operates through carefully constructed patterns of place and sentiment that reinforce its central themes about creation, responsibility, and the consequences of unchecked ambition.
Rather than diminishing the literary richness of Frankenstein, digital analysis reveals new dimensions of Shelley's artistic achievement, demonstrating how computational methods can serve literary understanding and open new avenues for critical interpretation.
This analysis was conducted using computational text analysis, geoparsing technology, and RoBERTa sentiment analysis. All visualizations are interactive and can be explored in detail to examine specific locations, characters, and textual patterns.